Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Variant Discovery ◾ 153

Ready-built human databases are available at ANNOVAR server, UCSC Genome Browser

website, or third parties and can be downloaded using “annotate_variation.pl” program.

For the non-human species which have no available annotation databases, a database can

be built from the FASTA sequence of the reference genome and the GFF/GTF annota-

tion file of that species. Those two files can be downloaded from databases such as NCBI

Genome database or UCSC database.

As shown in Table 4.4, ANNOVAR consists of six Perl files that can be used as com-

mand-line programs on any computer with Perl installed. The download instructions are

available at “https://annovar.openbioinformatics.org/en/latest/user-guide/download/”.

You may be asked to register with your school email. The download link will be emailed to

you, and then you can download the compressed file onto your computer and decompress

it with “tar xvf” command. If you are using Linux, you can add ANNOVAR to the path by

adding the following line to the end of “.bashrc” file:

Export PATH=”YOURPATH/annovar:$PATH”

4.3.3.1 Annotation Databases

For variant annotation, ANNOVAR uses annotation databases of an organism to be down-

loaded in a directory. Databases can be downloaded from UCSC Genome Browser, 1000

genome project or ANNOVAR website, or from a third-party URL. You can use “anno-

tate_variation.pl” to annotate, download a database, or list the available databases for a

specific build. The general syntax is as follows:

annotate_variation.pl \

[arguments] \

<query-file|table-name> \

<database-location>

For the complete list of argument run:

annotate_variation.pl -h

To list the available annotation databases for the hg19 build of the human reference genome,

you can run the following command:

TABLE 4.4 ANOVAR Script Files

ANNOVAR Program

Description

annotate_variation.pl

The core ANNOVAR program for annotation and database download

coding_change.pl

To calculate the mutated sequence and make inference

convert2annovar.pl

To convert genotype-calling file format into ANNOVAR input format

retrieve_seq_from_fasta.pl

To retrieve genomic nucleotide, cDNA sequences, or translated amino acid

sequences from FASTA file

table_annovar.pl

To generate a tab-delimited output file with annotation columns

variants_reduction.pl

For prioritizing causal variants